D’Agostino’s \(K^2\) normality test (dagostino_k2)#

D’Agostino–Pearson’s \(K^2\) test is an omnibus normality test: it looks for departures from normality due to skewness and/or kurtosis.

Learning goals#

By the end you should be able to:

  • explain what \(K^2\) tests (and what it does not test)

  • compute skewness and kurtosis and connect them to distribution shape

  • implement the full test in NumPy only (low-level)

  • use Plotly visuals (histogram, Q–Q) to understand why normality fails

  • interpret the statistic and p-value responsibly

Table of contents#

  1. What the test is for

  2. Hypotheses + outputs

  3. Intuition: skewness + kurtosis

  4. NumPy-only implementation

  5. Visual diagnostics

  6. Geometry + chi-square interpretation

  7. Interpreting the result

  8. Pitfalls + guidance

  9. Exercises + references

import math

import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

np.set_printoptions(precision=6, suppress=True)

rng = np.random.default_rng(42)

1) What the test is for#

You typically use D’Agostino’s \(K^2\) test when you want a quick, automated check of whether a sample is plausibly normal.

Common workflows:

  • checking normality of residuals (linear regression / ANOVA diagnostics)

  • choosing between parametric vs non-parametric tests

  • deciding whether a normal noise model is a reasonable approximation

  • validating simulation outputs that are supposed to be normal

What it does not do:

  • it does not tell you which non-normal distribution you have

  • it does not “prove normality” when the p-value is large

2) Hypotheses + outputs#

  • \(H_0\) (null): the data are a random sample from a normal distribution (some \(\mu, \sigma\)).

  • \(H_1\) (alternative): the data are not from a normal distribution.

The test returns:

  • K2: an omnibus statistic (larger → more evidence against normality)

  • p_value: the upper-tail probability \(P(K2_{H0} \ge K2_{obs})\)

Rules of thumb:

  • if p_value < α (e.g. 0.05), you reject normality at level α

  • if p_value α, you fail to reject (not the same as “accept normality”)

Sample size notes:

  • most references (and SciPy) require n ≥ 8

  • the kurtosis transform is asymptotic and becomes more reliable as n grows (you’ll often see guidance like “n > 20”)

3) Intuition: skewness + kurtosis#

\(K^2\) combines two shape diagnostics:

  • Skewness (3rd standardized moment): asymmetry

    • positive skew: long right tail

    • negative skew: long left tail

  • Kurtosis (4th standardized moment): tail heaviness / peakedness

    • normal has Pearson kurtosis = 3

    • heavier tails → larger kurtosis; lighter tails → smaller kurtosis

\(K^2\) is “omnibus” because it can reject normality if either skewness or kurtosis (or both) look unusual under \(H_0\).

4) NumPy-only implementation (low-level)#

We implement the same basic steps as scipy.stats.normaltest:

  1. compute sample skewness and kurtosis

  2. transform each into an approximately standard-normal z-score

  3. combine them: \(K^2 = z_{skew}^2 + z_{kurt}^2\)

  4. compute the p-value from \(\chi^2(2)\)

def _as_1d_finite(x):
    x = np.asarray(x, dtype=float).ravel()
    return x[np.isfinite(x)]


def _central_moments_1d(x):
    x = _as_1d_finite(x)
    mean = x.mean()
    d = x - mean
    m2 = np.mean(d**2)
    m3 = np.mean(d**3)
    m4 = np.mean(d**4)
    return mean, m2, m3, m4


def sample_skewness(x):
    _, m2, m3, _ = _central_moments_1d(x)
    if m2 == 0:
        return np.nan
    return m3 / (m2 ** 1.5)


def sample_kurtosis_pearson(x):
    _, m2, _, m4 = _central_moments_1d(x)
    if m2 == 0:
        return np.nan
    return m4 / (m2 ** 2)  # normal → 3


def dagostino_skew_z(x):
    x = _as_1d_finite(x)
    n = x.size
    if n < 8:
        raise ValueError(f'dagostino_skew_z requires n >= 8; got n={n}.')

    g1 = sample_skewness(x)
    if not np.isfinite(g1):
        return np.nan

    y = g1 * np.sqrt(((n + 1) * (n + 3)) / (6.0 * (n - 2)))
    beta2 = (
        3.0
        * (n**2 + 27 * n - 70)
        * (n + 1)
        * (n + 3)
        / ((n - 2.0) * (n + 5) * (n + 7) * (n + 9))
    )
    W2 = -1.0 + np.sqrt(2.0 * (beta2 - 1.0))
    delta = 1.0 / np.sqrt(0.5 * np.log(W2))
    alpha = np.sqrt(2.0 / (W2 - 1.0))

    # Exactly-zero y can happen for perfectly symmetric toy data; match common implementations.
    if y == 0.0:
        y = 1.0

    z = delta * np.log(y / alpha + np.sqrt((y / alpha) ** 2 + 1.0))
    return z


def anscombe_glynn_kurt_z(x):
    x = _as_1d_finite(x)
    n = x.size
    if n < 5:
        raise ValueError(f'anscombe_glynn_kurt_z requires n >= 5; got n={n}.')

    b2 = sample_kurtosis_pearson(x)
    if not np.isfinite(b2):
        return np.nan

    E = 3.0 * (n - 1) / (n + 1)
    varb2 = 24.0 * n * (n - 2) * (n - 3) / ((n + 1) ** 2 * (n + 3) * (n + 5))
    x_stat = (b2 - E) / np.sqrt(varb2)

    sqrtbeta1 = (
        6.0
        * (n * n - 5 * n + 2)
        / ((n + 7) * (n + 9))
        * np.sqrt((6.0 * (n + 3) * (n + 5)) / (n * (n - 2) * (n - 3)))
    )
    A = 6.0 + 8.0 / sqrtbeta1 * (2.0 / sqrtbeta1 + np.sqrt(1.0 + 4.0 / (sqrtbeta1**2)))

    term1 = 1.0 - 2.0 / (9.0 * A)
    denom = 1.0 + x_stat * np.sqrt(2.0 / (A - 4.0))
    if denom == 0.0:
        return np.nan

    term2 = np.sign(denom) * (((1.0 - 2.0 / A) / np.abs(denom)) ** (1.0 / 3.0))
    z = (term1 - term2) / np.sqrt(2.0 / (9.0 * A))
    return z


def dagostino_k2_test(x):
    x = _as_1d_finite(x)
    n = x.size
    if n < 8:
        raise ValueError(f'dagostino_k2_test requires n >= 8; got n={n}.')

    g1 = sample_skewness(x)
    b2 = sample_kurtosis_pearson(x)
    z_skew = dagostino_skew_z(x)
    z_kurt = anscombe_glynn_kurt_z(x)

    K2 = z_skew**2 + z_kurt**2
    p_value = float(np.exp(-0.5 * K2))  # chi-square(df=2) survival function

    return {
        'n': int(n),
        'skewness': float(g1),
        'kurtosis_pearson': float(b2),
        'z_skew': float(z_skew),
        'z_kurt': float(z_kurt),
        'K2': float(K2),
        'p_value': p_value,
    }

Quick example: inspect the components#

The two component z-scores tell you why the omnibus statistic is large:

  • large |z_skew| → asymmetry (skewed distribution)

  • large |z_kurt| → tails/peakedness differ from normal

x = rng.normal(0, 1, size=200)
ours = dagostino_k2_test(x)
ours

# Optional SciPy check (if installed)
try:
    from scipy.stats import normaltest

    sp = normaltest(x)
    {"scipy_K2": float(sp.statistic), "scipy_p_value": float(sp.pvalue)}
except Exception as e:
    f"SciPy not available: {e}"

5) Visual diagnostics#

Normality tests reduce a full distribution to a single number, so you should almost always pair \(K^2\) with:

  • a histogram + fitted normal curve

  • a Q–Q plot (straight line suggests normality; curvature suggests tails/skew)

def normal_pdf(x, mu=0.0, sigma=1.0):
    x = np.asarray(x, dtype=float)
    return (1.0 / (sigma * np.sqrt(2.0 * np.pi))) * np.exp(-0.5 * ((x - mu) / sigma) ** 2)


def norm_ppf(p):
    """Acklam's rational approximation of the standard normal quantile (NumPy-only)."""
    p = np.asarray(p, dtype=float)
    if np.any((p <= 0) | (p >= 1)):
        raise ValueError('p must be in (0, 1)')

    a = np.array([
        -3.969683028665376e01,
         2.209460984245205e02,
        -2.759285104469687e02,
         1.383577518672690e02,
        -3.066479806614716e01,
         2.506628277459239e00,
    ])
    b = np.array([
        -5.447609879822406e01,
         1.615858368580409e02,
        -1.556989798598866e02,
         6.680131188771972e01,
        -1.328068155288572e01,
    ])
    c = np.array([
        -7.784894002430293e-03,
        -3.223964580411365e-01,
        -2.400758277161838e00,
        -2.549732539343734e00,
         4.374664141464968e00,
         2.938163982698783e00,
    ])
    d = np.array([
         7.784695709041462e-03,
         3.224671290700398e-01,
         2.445134137142996e00,
         3.754408661907416e00,
    ])

    plow = 0.02425
    phigh = 1.0 - plow

    x = np.empty_like(p)

    m1 = p < plow
    if np.any(m1):
        q = np.sqrt(-2.0 * np.log(p[m1]))
        num = (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
        den = ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
        x[m1] = num / den

    m2 = (p >= plow) & (p <= phigh)
    if np.any(m2):
        q = p[m2] - 0.5
        r = q * q
        num = (((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r + a[5]) * q
        den = (((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r + 1.0)
        x[m2] = num / den

    m3 = p > phigh
    if np.any(m3):
        q = np.sqrt(-2.0 * np.log(1.0 - p[m3]))
        num = (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
        den = ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
        x[m3] = -(num / den)

    return x


def qq_points_standardized(x):
    x = _as_1d_finite(x)
    x = (x - x.mean()) / x.std(ddof=0)
    xs = np.sort(x)
    n = xs.size
    p = (np.arange(1, n + 1) - 0.5) / n
    z = norm_ppf(p)
    return z, xs


def add_hist_with_normal_fit(fig, x, row, col, title):
    x = _as_1d_finite(x)
    mu = x.mean()
    sigma = x.std(ddof=0)
    grid = np.linspace(x.min(), x.max(), 400)

    fig.add_trace(
        go.Histogram(
            x=x,
            nbinsx=45,
            histnorm='probability density',
            opacity=0.65,
            name=title,
        ),
        row=row,
        col=col,
    )
    fig.add_trace(
        go.Scatter(
            x=grid,
            y=normal_pdf(grid, mu=mu, sigma=sigma),
            mode='lines',
            name='Normal fit',
        ),
        row=row,
        col=col,
    )
    fig.update_xaxes(title_text='x', row=row, col=col)
    fig.update_yaxes(title_text='density', row=row, col=col)


def add_qq(fig, x, row, col):
    z, xs = qq_points_standardized(x)
    lo = float(min(z.min(), xs.min()))
    hi = float(max(z.max(), xs.max()))

    fig.add_trace(
        go.Scatter(x=z, y=xs, mode='markers', marker=dict(size=5), name='Q–Q points'),
        row=row,
        col=col,
    )
    fig.add_trace(
        go.Scatter(x=[lo, hi], y=[lo, hi], mode='lines', name='45° line'),
        row=row,
        col=col,
    )
    fig.update_xaxes(title_text='Theoretical quantiles (N(0,1))', row=row, col=col)
    fig.update_yaxes(title_text='Sample quantiles (standardized)', row=row, col=col)


n = 400
x_normal = rng.normal(0, 1, size=n)
x_logn = rng.lognormal(mean=0.0, sigma=0.6, size=n)
x_mix = np.concatenate([rng.normal(-2, 1, size=n // 2), rng.normal(2, 1, size=n - n // 2)])

datasets = {'normal': x_normal, 'lognormal': x_logn, 'mixture': x_mix}
titles = []
for name, x in datasets.items():
    res = dagostino_k2_test(x)
    titles.extend([f"{name}: hist (p={res['p_value']:.3g})", f"{name}: Q–Q"])

fig = make_subplots(rows=3, cols=2, subplot_titles=titles, horizontal_spacing=0.12, vertical_spacing=0.12)

for i, (name, x) in enumerate(datasets.items(), start=1):
    add_hist_with_normal_fit(fig, x, row=i, col=1, title=f'{name} sample')
    add_qq(fig, x, row=i, col=2)

fig.update_layout(height=950, showlegend=False, title_text='Histogram + Q–Q (with K² p-values)')
fig.show()

6) Geometry + chi-square interpretation#

The test first turns skewness and kurtosis into two approximately standard-normal z-scores:

  • \(z_{skew}\)

  • \(z_{kurt}\)

and then combines them: $\(K^2 = z_{skew}^2 + z_{kurt}^2\)$

So you can think of \(K^2\) as a squared distance from the origin in the plane \((z_{skew}, z_{kurt})\).

Under \(H_0\), \(K^2\) is approximately \(\chi^2\) with 2 degrees of freedom, and the p-value is: $\(p = P(\chi^2_2 \ge K^2) = \exp(-K^2/2)\)$

def dagostino_components_batch(samples):
    """Return (z_skew, z_kurt) for many samples at once (samples shape: [sims, n])."""
    samples = np.asarray(samples, dtype=float)
    n = samples.shape[1]
    if n < 8:
        raise ValueError('need n >= 8')

    mean = samples.mean(axis=1, keepdims=True)
    d = samples - mean
    m2 = np.mean(d**2, axis=1)
    m3 = np.mean(d**3, axis=1)
    m4 = np.mean(d**4, axis=1)

    g1 = m3 / (m2 ** 1.5)
    b2 = m4 / (m2 ** 2)

    # skew z
    y = g1 * np.sqrt(((n + 1) * (n + 3)) / (6.0 * (n - 2)))
    beta2 = (
        3.0
        * (n**2 + 27 * n - 70)
        * (n + 1)
        * (n + 3)
        / ((n - 2.0) * (n + 5) * (n + 7) * (n + 9))
    )
    W2 = -1.0 + np.sqrt(2.0 * (beta2 - 1.0))
    delta = 1.0 / np.sqrt(0.5 * np.log(W2))
    alpha = np.sqrt(2.0 / (W2 - 1.0))
    y = np.where(y == 0.0, 1.0, y)
    z_skew = delta * np.log(y / alpha + np.sqrt((y / alpha) ** 2 + 1.0))

    # kurt z
    E = 3.0 * (n - 1) / (n + 1)
    varb2 = 24.0 * n * (n - 2) * (n - 3) / ((n + 1) ** 2 * (n + 3) * (n + 5))
    x_stat = (b2 - E) / np.sqrt(varb2)

    sqrtbeta1 = (
        6.0
        * (n * n - 5 * n + 2)
        / ((n + 7) * (n + 9))
        * np.sqrt((6.0 * (n + 3) * (n + 5)) / (n * (n - 2) * (n - 3)))
    )
    A = 6.0 + 8.0 / sqrtbeta1 * (2.0 / sqrtbeta1 + np.sqrt(1.0 + 4.0 / (sqrtbeta1**2)))
    term1 = 1.0 - 2.0 / (9.0 * A)
    denom = 1.0 + x_stat * np.sqrt(2.0 / (A - 4.0))
    term2 = np.sign(denom) * np.where(
        denom == 0.0,
        np.nan,
        ((1.0 - 2.0 / A) / np.abs(denom)) ** (1.0 / 3.0),
    )
    z_kurt = (term1 - term2) / np.sqrt(2.0 / (9.0 * A))

    return z_skew, z_kurt


n = 60
sims = 2500

samples_normal = rng.normal(0, 1, size=(sims, n))
samples_logn = rng.lognormal(mean=0.0, sigma=0.6, size=(sims, n))
samples_t3 = rng.standard_t(df=3, size=(sims, n))

# symmetric but non-normal: 0.5*N(-2,1) + 0.5*N(2,1)
sign = rng.choice([-1.0, 1.0], size=(sims, n))
samples_mix = rng.normal(loc=2.0, scale=1.0, size=(sims, n)) * sign

z1_n, z2_n = dagostino_components_batch(samples_normal)
z1_l, z2_l = dagostino_components_batch(samples_logn)
z1_t, z2_t = dagostino_components_batch(samples_t3)
z1_m, z2_m = dagostino_components_batch(samples_mix)

alpha = 0.05
k2_thresh = -2.0 * np.log(alpha)
r = float(np.sqrt(k2_thresh))
theta = np.linspace(0, 2 * np.pi, 400)
circle_x = r * np.cos(theta)
circle_y = r * np.sin(theta)

fig = go.Figure()
fig.add_trace(go.Scatter(x=z1_n, y=z2_n, mode='markers', name='normal', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=z1_l, y=z2_l, mode='markers', name='lognormal', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=z1_t, y=z2_t, mode='markers', name='t(df=3)', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=z1_m, y=z2_m, mode='markers', name='mixture', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=circle_x, y=circle_y, mode='lines', name=f'reject @ α={alpha}', line=dict(color='black', dash='dash')))

fig.update_layout(
    title=f'Z-space view: K² = z_skew² + z_kurt² (n={n}, sims={sims} per distribution)',
    xaxis_title='z_skew',
    yaxis_title='z_kurt',
    width=900,
    height=650,
)
fig.update_yaxes(scaleanchor='x', scaleratio=1)
fig.show()

7) Under the null: \(K^2 \sim \chi^2(2)\) and p-values are uniform#

A good self-check is to simulate normal samples and confirm:

  • \(K^2\) matches the \(\chi^2(2)\) shape

  • p-values look approximately Uniform(0,1)

n = 80
sims = 12000
samples = rng.normal(0, 1, size=(sims, n))
z_skew, z_kurt = dagostino_components_batch(samples)
K2 = z_skew**2 + z_kurt**2
pvals = np.exp(-0.5 * K2)

grid = np.linspace(0.0, float(np.quantile(K2, 0.99)), 400)
chi2_pdf = 0.5 * np.exp(-0.5 * grid)  # df=2

fig1 = go.Figure()
fig1.add_trace(go.Histogram(x=K2, nbinsx=60, histnorm='probability density', opacity=0.65, name='simulated K²'))
fig1.add_trace(go.Scatter(x=grid, y=chi2_pdf, mode='lines', name='chi-square(df=2) pdf'))
fig1.update_layout(title='K² under H0 ≈ chi-square(df=2)', xaxis_title='K²', yaxis_title='density', barmode='overlay')
fig1.show()

fig2 = go.Figure()
fig2.add_trace(go.Histogram(x=pvals, nbinsx=50, histnorm='probability density', opacity=0.65, name='p-values'))
fig2.add_trace(go.Scatter(x=[0, 1], y=[1, 1], mode='lines', name='Uniform(0,1) density', line=dict(color='black', dash='dash')))
fig2.update_layout(title='p-values under H0 should be ~Uniform(0,1)', xaxis_title='p-value', yaxis_title='density', barmode='overlay')
fig2.show()

8) Interpreting the result (what it means)#

  • p_value is not “the probability the data are normal”.

  • It is: assuming normality, how surprising is a \(K^2\) at least this large?

  • Small p_value → evidence that skewness and/or kurtosis are inconsistent with a normal sample.

  • Large p_value → you did not find strong evidence against normality (but normality can still be false).

Use the components:

  • large |z_skew| → asymmetry

  • large |z_kurt| → tails/peakedness differ from normal

Reporting example:

D’Agostino \(K^2\) normality test, n=200: \(K^2\)=2.40, p=0.30.

With large n, rejection can be “statistically significant” but practically irrelevant — always inspect the Q–Q plot.

9) Pitfalls + guidance#

  • Large n: tiny, practically irrelevant deviations can produce tiny p-values (high power).

  • Outliers: a few extreme points can dominate skewness/kurtosis and trigger rejection.

  • Dependence: the test assumes i.i.d. samples; autocorrelation (time series) can invalidate p-values.

  • Discrete/rounded data: many ties can cause deviations even when a normal model is a decent approximation.

  • Small n: the approximations are rough; consider Shapiro–Wilk for very small samples and always inspect Q–Q plots.

Diagnostics checklist:

  • for regression diagnostics, test residuals

  • inspect z_skew vs z_kurt to see why it rejects

  • if normality matters for inference, consider robust or resampling-based alternatives

10) Exercises#

  1. Estimate the empirical Type I error at α=0.05 by simulating many normal samples.

  2. Compare power against lognormal, t(3), and a Gaussian mixture as you vary n.

  3. Fit a linear regression, compute residuals, and run \(K^2\) on the residuals. Does it match your Q–Q plot?

  4. Apply a transform (log, Box–Cox) to a skewed sample and compare z_skew, z_kurt, and p_value.

References#

  • D’Agostino, R. B. (1971). An omnibus test of normality for moderate and large sample size. Biometrika.

  • D’Agostino, R. B. and Pearson, E. S. (1973). Tests for departure from normality. Biometrika.

  • Anscombe, F. J. and Glynn, W. J. (1983). Distribution of the kurtosis statistic b2 for normal samples. Biometrika.

  • SciPy documentation: scipy.stats.normaltest, skewtest, kurtosistest.